A complete guide to the statistical framework behind experimentation, followed by a real-world case study and an interactive tool to run your own tests.
Modern A/B testing is only reliable when experiments are designed with clear statistical guardrails. We don't just "run a test"; we design a Power Analysis to ensure we can actually detect the truth.
| Reality \ Decision | Detect Effect | No Effect Detected |
|---|---|---|
| Effect Exists | True Positive (Power) | False Negative (β) |
| No Effect Exists | False Positive (α) | True Negative |
The Core Formulas:
Where $n$ is the minimum sample size required per variant to detect a difference between $p_1$ (Control) and $p_2$ (Test).
A marketing team wants to test whether a new checkout design improves conversion rate. Before launching, they need to know: "How many users do we need?"
Conclusion: We need ~8,150 users per variant. If we launch with less, the test is mathematically invalid.
Adjust the parameters below and click "Run Analysis". If your sample size is lower than required, the result will be flagged.